Skip to content

Conversation

@gongweibao
Copy link
Contributor

@gongweibao gongweibao commented Aug 4, 2021

PR types

Function optimization

PR changes

Others

Describe

  1. Checknumeric is very slow and
  2. ReaduceMeanD will trigger sum op error under fp16. So cast input tensor to fp32 to avoid this.
  3. c_allreduce_sum doesn't core under fp16, but core under fp32.

@gongweibao gongweibao requested a review from zhiqiu August 5, 2021 11:36
@gongweibao gongweibao changed the title Fixchecknumeric Use another method to void c_allreduce_sum core! Aug 5, 2021
@gongweibao gongweibao changed the title Use another method to void c_allreduce_sum core! [NPU]Use another method to void c_allreduce_sum core! Aug 5, 2021
Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gongweibao gongweibao merged commit c91b1e0 into PaddlePaddle:develop Aug 6, 2021
@gongweibao gongweibao deleted the fixchecknumeric branch August 6, 2021 03:32
@gongweibao gongweibao restored the fixchecknumeric branch August 8, 2021 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants